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A  h)  s  t  r  a  c  t 


Helatior.s  provide  users  with  a  simple  powerful  model  to 
view  data  bases.  Relational  languages  offer  considerable 
advantages  for  both  applications  programmers  and  casual  users  of 
a  data  base  system.  r^any  difficult  problems  need  solution  for 
the  implementation  of  a  relational  point  of  view.  This  paper 
discusses  some  implementation  techniques  for  relations.  Pointer 
arrays  are  proposed  as  the  major  building  block  of  a  relational 
system.  An  outline  of  an  implementation  is  given.  Hierarchies 
and  their  use  for  implementing  relations  is  considered.  A_  trace 
implementation  is  outlined.  Finally  the  concurrent  update 
problem  is  discussed.  Work  is  currently  going  on  associated  with 
the  Zeta  project  and  EDBS  project  at  the  University  of  Toronto, 
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A  Data  Rase  Kanagement  System  (CBMS)  proviles  a  facility 
to  store  and  retrieve  data,  change  data  and  manipulate 
relationships  between  data.  An  Operating  System  usually  provides 
some  low  level  facilities  on  which  a  DBMS  is  based.  They  include 
software  processes,  some  synchronization  and  message  passing 
primitives,  a  protection  mechanism,  and  at  least  one  elementary 
access  method.  We  will  make  the  assumption  that  a  basic  direct 
access  method  {BDAd)  is  available.  Given  F  and  i  ,  the  access 
method  can  locate  and  access  the  ith  record  of  a  file  F  . 
This  function  is  common  and  is  compatible  both  with  one  level 
store  concepts  and  with  expectations  of  large,  cheap  main 
memories , 


We  will  adopt  a  relational  point  of  view  with  respect  to 
basic  structures  and  commands  [Codd  70,  71],  A  relation  can  be 
thought  of  as  an  array  whose  columns  correspond  to  different 
attributes  and  whose  rows  (or  tuples)  correspond  to  different 
entities.  The  commands  implemented  by  the  DBdS  are  defined  by  a 
language  which  is  logically  complete  for  manipulating  relations 
[Codd  70,  71].  Examples  of  languages  for  this  purpose  can  be 
found  in  the  literature,  e.g.,  alpha-language  [Codd  70,  71], 
Collard  [ Eracchi  et  al  72]  and  Square  [Boyce  et  al  73],  Other 
applications  oriented  languages  can  be  implemented  on  top  of  the 
DB.riS. 
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..We  will  ass.uiae  t.hat.  the  users  generate  requests  through 
a  user-oriented  query  language.  The  requests  specify  three 
entities:  R  ,  P  and  Q  .  H  is  a  named,  predeclared  relation. 
D  is  a  subset  of  the  domains  of  S  participating  in  the 
request,  0  is  a  boolean  expression  involving  domains, 
operators,  and  values  which  qualifies  (or  selects)  some  of  the 
tuples  of  P  .  Each  request  asts  for  an  action  to  be  performed, 
such  as  retrieval  or  update*  We  will  not  elaborate  on  the  form 
in  which  the  requests  are  presented.  Rather,  we  will  investigate 
the  mechanisms  necessary  to  service  them. 

We  will  outline  two  implementations  of  relations  as  they 
appear  in  two  on-going  projects  at  the  University  of  Toronto. 
The  first  implementation  is  based  on  unary  relations  (Zeta 
project).  The  second  implementation  is  based  on  tree  structures 
{ED35  project).  We  will  discuss  different  problems  and  proposed 
solutions  in  the  context  of  the  two  syst.ems.  We  cannot 
experiment  with  all  aspects  of  relation  implementation.  As  a 
result  some  of  the  issues  will  be  either  ignored  or  only  casually 
mentioned . 
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2*  Basic  Eleinents 

In  the  first  project  (Zeta  project)  the  desicn  is  based 
on  a  simple,  widely  applicable  building  block.  Relations  irx 
their  general  form  are  usually  irapleraanted  on  a  simpler  more 
basic  type  of  data  structure.  This  approach  provides  economy  of 
concept,  a  simple  and  clean  structure,  and  ease  of 
implementation.  General  n-ary  relations  can  be  implemented  on 
more  restricred  binary  relations,  e.g,,  XRAM.  on  [  Lorie  ;  Lorie 
and  Symonds  72  ]. 

tfe  propose  the  unary  relation  as  a  basic  mechanism  to 
implement  relations.  h  unary  relation  is  essentially  an  array 
{column,  vector)  of  similar  entities.  In  our  case  we  will 
restrict  the  mechanism  even  further  by  looking  at  arrays  of 
pointers.  Pointer  arrays  have  been  proposed  elsewhere  as  a  basic 
implementation  tool  [ DBTG  71].  we  will  investigate  their 
application  in  a  relation  implementation. 

kl^ic  element  is  a  data  structure  consisting  of  a 
header  and  a  body.  The  header  is  a  small  fixed  size  descriptor 
which  serves  as  a  control  block  to  an  ordered  set  of  slots.  The 
number  of  slots  (i.e.,  size  of  the  body)  can  vary.  Each  slot  is 
of  fixed  size  and  may  or  may  not  be  empty.  A  slot’s  contents  are 
not  necessarily  unique.  It  will  usually  be  interpreted  as  an 
offset  into  a  conventional  file  or  an  index  to  another  basic 
element.  A  basic  element  will  usually  be  interpreted  as  a  set 
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of  p^ointers  directed  at  records  of  a  conventional  file,  .^e 
outline  an  in  plenienta  t  ion  in  section  3. 

There  is  a  set  of  standard  commands  which  can  ba  used  to 
manipulate  basic  elements.  TJsing  such  coriimands  one  can  create  or 
destroy  basic  elements,  increase  or  decrease  the  body  sizes  and 
search  or  change  the  values  in  the  slots.  We  will  not  discuss 
the  function  and  format  of  the  commands  in  detail. 


We  will  outline  briefly  some 
can  be  provided  using  basic  elements, 
sketched  in  a  rather  naive  way.  The 


important  facilities  which 
Some  of  the  facilities  are 
objective  is  not  to  propose 


realistic  schemes,  bu 


-i-  u 


ir  to 


the  usefulness  of  basi; 


elements. 


2,1  relations 

Consider  a  file  implementing  a  relation  in  the  natural 
way.  Namely  each  record  of  the  file  corresponds  to  a  tuple  of 
the  relation.  We  will  call  such  a  relation  in  our  system  a 
primary  relation.  Not  all  relations  have  ho  be  implemented  in 
such  a  way.  Relations  may  share  domains  and/or  tuples. 
Implementing  them  as  separate  files  can  be  both  wasteful  and 
error  prone.  It  is  wasteful  because  we  have  to  duplicate  data. 
It  is  error  prone  because  the  redundancy  generates  ccmplicated 
consistency  problems  when  modifying  or  deleting  tuples.  There  is 
a  need  to  implement  relations  using  pointer  structures  without 
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duplication  of  data.  Basic  elements  can  provide  such  a  facility. 
A  relation  irplerented  using  basic  elements  will  be  called  a 
derived  relation  as  opposed  to  a  primary  relation. 

It  should  he  obvious  that  a  basic  element  can  be  used  to 
represent  any  subrelation  of  a  primary  relation.  The  painters  in 
the  basic  element’s  body  point  to  the  appropriate  tuples.  At  the 
same  time  we  limit  our  attention  to  certain  domains  of  the 
primary  relation.  We  will  investigate  the  possibility  of 
representing  derived  relations  involving  more  than  one  primary 
relation,  As  an  example  we  will  discuss  joins  of  relations  * Codd 
IQ,  71]. 

Basic  elements  offer  a  natural  way  to  represent  joins  of 
relations.  Consider  two  relations  HI  and  H2  and  their  join 


according  to 

a 

common  domain  D 

• 

The 

join  of  Rl  and 

R2 

be  thought  of 

as 

a  subrelation 

T1 

of 

Rl  together 

with 

subre lation 

T2 

of  E2  .  Each 

row 

of 

T1  corresponds 

to  a 

of  12  according  to  the  value  of  D  which  they  share.  Two 
basic  elements  can  be  combined  ”in  parallel"  to  represent  the 
join  of  Rl  and  R2  .  A  basic  element  El  is  constructed 
corresponding  to  subrelation  T1  and  another  element  S2 
corresponding  to  subrelation  T2  ,  Furthermore,  the  order  of 


slots  in 

the  bodies 

of 

El  and  E2  gives 

the 

correspondence 

of 

eleme  nt  s 

of  T1 

and 

T2 

.  That  is,  the  row 

of 

Tl  pointed 

to 

by  the 

pth 

slot 

of 

El  concatenated 

with  the  row  of 

T2 

pointed  to  by  the  pth  slot  of  E2  gives  a  row  of  the  join  of 
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'Je  did  not  elaborate  on  how  the  join  is  constructed, 
that  is  how  basic  elements  SI  and  E2  are  obtained.  It  is 
imperative  not  to  bind  such  an  important  decision  early  in  the 
design.  It  is  sufficient  to  point  out  that  the  commands  on  basic 
elements  can  be  used  to  construct  the  appropriate  elements 
El,,..,En  corresponding  to  the  join.  Joins  of  derived  relations 
can  be  constructed,  but  it  is  necessary  to  use  basic  elements 
referring  indirectly  to  derived  relations. 

2.2  Add it iona 1  uses 

The  inverted  file  is  an  essential  tool  in  a  DBMS  for 
assisting  in  searches.  Basic  elements  can  be  used  to  implement 
inverted  files.  For  a  particular  domain  D  ,  we  generate  a  basic 
element,  Ei  ,  corresponding  to  each  and  every  value,  ai  ,  of 
D,  The  slots  of  Ei  point  to  all  tuples  having  value  di  for 
D.  A.  special  header  S  is  constructed  which  links  all  the  E’s 
together.  If  the  inverted  domain  is  a  key,  it  is  rather  wasteful 
to  implement  the  inverted  file  with  such  a  combination  of  basic 
elements.  Since  each  basic  element  will  point  to  only  a  single 
tuple.  Rather,  we  can  construct  one  basic  element  with  its 
pointers  ordered  according  to  a  natural  order  of  the  key.  Then  a 
domain  value  will  give  us  the  position  pointer  directly. 
Alternatively,  keys  can  be  hashed  into  slot  indices. 
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Basic  elements  can  also  be  used  throughout  the  system 
for  other  purposes.  For  example,  a  primary  relation  R  ran  change 
by  both  updates  and  insertions.  We  may  want  derived  relations 
based  on  E  to  be  updated  either  automatically  or  periodically. 
In  the  automatic  case  every  time  an  element  changes  in  R  we 
check  the  effect  on  the  derived  relation  and  update  the  basic 
element  pointers.  Automatically  updating  all  derived  relations 
affected  can  be  very  expensive.  Alternatively,  we  can 
periodically  straighten  out  the  basic  element  pointers.  A  list 
of  all  updated  rows  of  a  relation  R  is  very  useful  for  this 
approach.  Such  a  list  can  be  maintained  using  another  basic 
element  pointing  at  changed  rows  of  R  . 


8 


3. 


-ziiil  lEi£is-il§.Ilt§Lil2D.  £SilI3.  elements 


A  relation  implementation  will  be  outlined  using  basic 
elements.  This  approach  is  followed  for  the  design  of  a 
relational  prototype  system  at  .the  University  of  Toronto  {Zeta 
project).  Only  th.e  important  characteristics  will  be  discussed. 
A  detailed  design  is  outside  the  scope  of  this  paper. 

The  "schema”  of  the  proposed  system  consists  mainly  of 
two  tables,  the  relation  table  and  the  domad^  Both  tables 
are  relations  themselves.  We  call  them  'tables*  to  avoid 
confusion  with  the  terminology  used  for  the  relations  which  they 
implement.  In  parentheses  we  will  list  tentative  sizes  for 
entries  in  the  tables  to  give  a  rough  indication  of  expected 
table  size. 

The  relation  table  records  all  relations  both  primary 
and  derived  known  to  the  system.  Tor  each  relation  it  has  a 
tuple  with  the  following  attributes: 

N  (8  bytes) .  Eelation  name  for  every  declare!  relation 
in  the  system  either  primary  or  derived.  All  names  are  unique  at 
this  level  of  the  system. 

J  {16  bytes).  Domains  of  the  corresponding  relation. 
The  domains  are  denoted  indirectly  as  pointers  to  the  domain 
table. 
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T  (8  bytes).  Type  of  the  relation,  e.g.,  priaary, 
subrelation,  join,  etc.  File  name  is  entered  in  case  of  primary 
relation.  Pointer  to  basic  element  (s)  is  entered  in  case  of 
derived  relation. 

S  (4  bytes) .  Security  field  and  locking  fields.  ye 

will  be  discussing  locking  problems  separately  in  section  4, 

The  lomain  table  contains  one  entry  for  each  domain 
known  to  the  system.  In  this  case  we  define  domain  to  be  an 
attribute  of  a  primary  relation  (field  of  a  record  of  a  file 
implementing  a  relation).  It  does  not  have  the  usual 

mathematical  meaning  of  domain  as  set  of  values.  Hole  names  do 
not  necessarily  have  an  entry  in  the  domain  table  [Codd  70,  71], 

A  domain  of  a  derived  relation  does  not  have  an  entry  in  the 
domain  table.  The  domain  table  has  the  following  attributes: 

M  (8  bytes) .  Domain  name. 

I  (2  bytes) .  Pointer  to  inverted  file  if  domain  is 

inverted . 

X  (8  bytes) ,  Relation  names  for  which  the  domain  is  an 
attribute.  They  are  denoted  indirectly  as  pointers  to  the 
relation  table. 

E  (4  bytes) .  Security  and  locking  fields  for  the 

doma in. 
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Suppose  we  restrict  the  number  of  known  relations  to  25S 
and  known  domains  to  256.  Then  one  byte  can  be  used  to  identify 
an  entry  of  either  the  domain  or  relation  table.  On  the  basis  of 
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16 
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relations.  In  addition,  the  size  of  both  the  relation  and  domain 
table  is  small  enough  for  them  to  be  core  resident.  Note  that 
all  these  numbers  are  tentative  to  give  an  idea  of  the 
implementation.  In  a  large  system  they  will  be  quite  different. 
On  the  other  hand,  this  does  not  present  any  conceptual  problem. 
After  all  the  tables  themselves  are  relations.  They  can  be 
partitioned,  stored  and  retrieved  like  any  other  relation  of  the 
system. 

One  control  monitor  is  in  charge  of  both  tables.  Only 
the  control  monitor  can  alter  the  contents  of  the  relation  and 
domain  tables.  Any  request  to  create,  destroy,  read  or  update 
any  relation  in  the  system  has  to  be  cleared  first  by  the 
monitor.  The  monitor  enforces  security  provisions  for  every 
request.  In  addition,  it  controls  concurrency  in  the  system. 

Every  relation  known  in  the  system  is  predeclared  and  an 
entry  is  made  in  the  relation  table.  If  the  relation  is  primary, 
then  a  file  corresponds  to  the  relation  in  a  natural  way.  If  the 
relation  is  derived  it  is  represented  in  the  system  by  a  set  of 
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Headers  of  basic  elements  are  fixed  sire  (6  bytes)  and. 
are  allocated  consecutively  in  a  table.  An  index  to  the  table 
uniquely  identifies  a  header.  Headers  can  point  directly  to 
bodies  or  they  can  be  used  as  links  to  point  to  other  headers. 
Headers  are  of  different  types. 

!•)  Simple  header.  It  points  to  the  pointer  array  comprising 
its  body.  It  contains  fields  specifying  its  type,  the  relation 
to  which  it  corresponds  (i.e.,  a  back  pointer  to  the  relation 
table),  its  body’s  starting  address  and  .its  body's  size. 

2)  Serial  header.  It  points  to  other  headers  whose  bodies 
should  be  considered  serially.  It  contains  a  type  field  and  from 
one  to  four  pointers  pointing  to  other  headers.  By  nesting 
serial  headers,  one  can  obtain  a  tree  of  pointer  arrays. 

£§Lrallel  header.  It  points  to  a  set  of  simple  or  serial 
headers  whose  bodies  should  be  considered  in  parallel,  that  is, 
their  slots  are  considered  together  according  to  their  order.  It 
can  represent  a  join  or  other  derived  relation  across  more  than 
one  primary  relations.  It  contains  its  type  and  from  one  to  four 
pointers  pointing  to  other  headers. 

2)  Qh§:ill§.^  h§§:^§£»  it  points  to  a  body  and  another  header 
which  is  chained  serially  or  in  parallel  to  this  basic  element. 
It  contains  type,  relation  index,  body  starting  address  and  size, 
and  next  and  previous  elements  in  the  chain.  Chained  headers 
give  the  ability  of  joining  more  than  four  relations.  In 
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addition  they  can  be  used  to  consider  serially  more  than  four 
basic  elements  viithout  usinc  a  tree  structure  of  serial  healers. 

For  some  applications,  such  as  inverted  files  we  may 
need  short  fixed  length  blocks  to  store  values.  Such  blocks  can 
be  implemented  by  the  same  mechanism  which  allocates  headers.  In- 
addition,  inverted  files,  if  implemented  in  terms  of  basic 
elements,  may  need  special  kinds  of  headers.  We  do  not  consider 
these  special  requirements  in  this  paper.  It  is  very  simple  to 
construct  other  special  kinds  of  headers  with  different  types. 
Each  type  corresponds  to  a  different  interpretation  of  the  slots* 
contents. 

The  ability  to  link  basic  elements  eliminates  the  need 
of  complete  variability  of  their  body  sizes.  One  body  size  might 
be  adequate.  Another  solution  is  to  have  several  body  sizes,  for 
instance,  16,  6h  and  256.  It  is  very  simple  to  manage  two  or 
three  sizes  with  a  modified  version  of  the  buddy  sysrem. 

We  will  not  give  complete  details  on  how  a  typical  query 
can  be  serviced  given  the  above  structure.  We  hope  nevertheless 
to  have  given  the  essence  of  a  possible  implementation  of  a 
relational  system  using  unary  relations. 
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^ •  Hierarchical  Inrlemen ta tion 

In.  the  second  project  (EDBS)  ve  implement  relations  on 
tree  structures.  Consider  a  small  hierarchical  data  base  as  in 
Figure  I,  .Figure  Ta  gives  the  type  tree  for  some  g.eograph ical 
data  organized  in  a  hierarchical  structure.  Each  state  can  have 
many  counties  which  can  have  in  turn  many  cities.  The  state, 
county,  city  fields  are  keys,  the  rest  of  the  fiel  ds  contain 
other  relevant  data.  The  hierarchical  structure  implies  a  set  of 
relations  as  in  Figure  Ib.  In  addition,  it  implements  a  certain 
structure  among  the  relation  tuples.  This  strucrure  enables  fast 
searching  for  certain  queries,  for  instance  "get  all  cities  in 
the  state  of  Illinois."  The  data  base  itself  consists  of  a  set 
of  trees  as  in  Figure  Ic  [IMS],  Each  node  of  Figure  Ic 
corresponds  to  a  "segment"  of  rhe  hierarchical  data  base.  In 
relational  terms  each  node  corresponds  to  a  tuple  of  one  of  the 
first  three  relations  of  Ib.  The  father-son  connections 
represented  by  the  tree  branches  of  Ic  will  be  present  implicitly 
in  the  relations.  The  relations,  however,  will  probably  reguire 
additional  structures  in  order  to  provide  the  same  efficient 
response  to  hierarchical  type . queries. 

h'e  will  avoid  the  controversy  as  to  whether  there  are 
naturally  hierarchical  real  world  data.  Let’s  assume  that  there 
are  cases  where  we  want  to  implement  the  data  hierarchically, 
either  because  it  is  natural  or  for  compatibility  with  an 
existing  system.  We  have  to  find  an  implementation  which  blends 


14 


:  '’■P/.'-i-  W 


?w,# 


i 


-ii. '(.. 


'H* 

;r- .  *•;  '  ).  *■ ;.' .  ■  v 


Tl  .jfx  ■.■i<i^'  ,  *•■/,. /I  '  ''i^  >,  ''V:.'  ’.  'iV  'S-s  -  ’  '’i'  i 

X  ^^'' '  iV'^'^i'’'''' ■  .A.,:.;  ^ ^ 


If  Ki  «■  ’ 


■  ;■♦ 


:3L, 


I'V''' 


'•  _ _ _ _ 


t,. 


-fKorr'  f,  » 

--  ' ''  ■■.'i'.ififli''^?/'^'  ■  ■.^- 


■rt  'j|:yr7  .  K.n;. 


:is 


■ '  I  /, ■'  ■  4' '“, ■<'  ':S’'-i'i;» :  J  *>  f 

"•■■■'■,  V”'  '  :.  ',  Vv  '■  r  ^ ,  •.a0M  f'iV'i 

'♦  ;'  r v..,, 

' '!;4W',V , , 

'i'v’  ’  ■  ’  , '  ' 

vf 


,  1 1  '; 


,  .  .  . . ^  .  .  .-, 

'  '  '''  ■l-%i:\I>MW^ 4  ':;^ 

^  ■  AT- nf '  ^ifeOSPY  1-:  »:^F?«-JB.  •.  l\in  im  .  ?•■_>>.'*. /ivll4B*^N  -v  .  ^-_ 


•■vx,.  _ip,.c. 


,**J!' 


lU''  ^  :J' !> 


■■5)  ;±:. 


4 


1 


'.  ..Li  '  ■■‘r.. iLte.ii. Ls.'  . '£  JiV^.  1:  .. ..:  t»r!iv*ir-,ji!ilffl 


.  .  ..  .,t  & 

..  #  ■ 


!• 


/'  ii  ii,i  '  !'J  : 

r  .:,  .,J  fl‘  "”'  •! 


.. ..........  ... ., 


W*.  ...  .|\J,""'  '- •  "Syif' ”'.^' jfJ  '■  'i^.j  ;''tj 

T-fr^K|  k  *lPl  i  '^''4  '  <•  ,f  '  V-‘  •>  t 

gjMl 


•4 


^  .  ‘  •■V  V*:  , -V  .^■“:". 

'£  .n^'n'itkMi.ikL  ,r.A-ia  iLiJi&t,^,.  '.SlTd.'W'tliLL  '‘.jib.tA  i  .L.i.ki.'.ii.’.ji  .....'.^f^>'(.L:.;i'.  ^»u.>J^'«L‘iniL....  :!Ji'‘  "’)i^'it‘'0'il*£'iSm 


■Ia  -Ax 

,r. 


' '  '  ■ ' ' 


1 


iV'f  . 

=  ’■"  ,,  *  /■■*  ■■ 


i  "  ^^;il 


Vj».  • 
-•** 


ai'  ■. . 


£4^'  ■  ,t  *■*■!’'*'?•■-  "■■.>:  ,  .  ,.  '2^'..^'''’^“9WS  •  ■•  ■an  •  i  U'-  ^  \;'!lr'?^»i''J 

.:i.  "  -'-  ■ 


naturally  with  other  r  ela  tior.  a  lly  organised  data  and  can  provi'.1s 
a  relational  view  to  interested  users.  The  tpechanism  should  also 
be  as  efficient  as  possible.  We  propose  traces  as  suoh  a 
mechanism  [Lowenthal  71],  Traces  are  "addresses”  of  nodes  in  a 
tree  structure.  Suppose  Illinois  is  the  20th  root  "seginent”  of 
the  hierarchical  data  base.  We  define  traces  as  in  Figure  Id. 
Each  node  is  completely  characterised  from  the  type  (i.e.,  state, 
county  or  city)  and  a  tuple  of  numbers  which  completely  define 
the  path  from  the  root  to  the  node.  The  root  "segment” 
identification  is  present  as  the  first  coordinate  of  the  trace. 
Traces  completely  specify  the  hierarchical  organization.  Namely, 
given  a  trace  it  is  simple  to  obtain  the  trace  of  an  ancestor 
node,  or  the  traces  of  descendant  nodes,  or  traces  of  twin 
brothers  in  the  hierarchical  structure.  If  we  also  have  a 
mapping  between  traces  and  where  the  "segments”  are,  then  we 
capture  completely  the  information  given  by  the  hierarchical 
organization. 

Suppose  we  store  all  state  "segments”  in  a  STA.TE  file, 
all  county  "segments"  in  a  COUNTY  file  and  all  city  "segments”  in 
a  CITY  file.  We  organize  the  files  in  a  natural  way,  each  record 
contains  a  "segment”  of  the  data  base.  In  this  manner  we  capture 
the  first  three  relations  of  Ib.  If  we  also  have  a  mapping 
between  traces  and  indices  of  the  files  as  in  le,  then  we  can 
reconstruct  the  rest  of  the  relations,  or  all  the  hierarchical 
organization.  Since  each  "segment"  type  is  a  different  file,  we 
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only  need  to  map  the  coordinates  of  the  trace  into  a  file  index. 


The  node  type  of  the 

trace  will  identify'  the 

file , 

Construe  t i no 

a  pract ical 

mapping 

I  (xl 

,  .  ,  .  ,  X n )  as  in 

figure  le  for  traces 

is  not  easy. 

We  can 

use 

one  level  of 

indirection  by  interpreting  trace  numbers  as  coordinates  of  an 
array  [ Lochovsky  and  Tsichritzis  1974].  The  entries  in  the  array 
are  indices  of  the  files.  However,  for  large  data  bases  the 
overhead  may  be  considerable. 

Hany  other  types  of  mappings  between  trees  and  files  can 
be  used.  The  mapping  problem  is  complicated  because  the  data 
bass  is  dynamic.  If  we  assume  that  we  do  not  allocate  nodes 
which  are  not  present,  then  the  scheme  needs  to  adapt  according 
to  the  growing  characteristics  of  the  data  base.  We  are  working 
on  the-  comparative  evaluation  of  mapping  schemes  [Bernstein  and 
Tsichritzis  1974],  The  problem  is  similar  to  hashing.  We  need 
to  hash  a  set  of  coordinates  into  an  index.  The  distribution  of 
the  traces  is  not  random.  In  a  hierarchical  structure  there  is  a 
natural  order  among  brothers  and  ancestors.  Given  that  a  node  is 
present  in  the  data  base  all  of  its  "older”  brothers  are  present 
and  all  of  its  ancestors  are  present. 

In  our  implementation  of  relations  based  on  hierarchies 
we  plan  to  have  both  a  hierarchical  and  a  relational  view  of  data 
on  the  same  data  base  [Lochovsky  and  Tsichritzis  1974],  In  such 
an  environment  we  can  experiment  with  user’s  attitudes.  In 
addition,  we  plan  to  investigate  the  possibility  of  implementing 
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We  hope 
S  e  pt  e  mber 


on  top  of  other  existing  structures,-  e.g.,  hierarchies, 
that  our  system  will  be  ready  for  the  classroom  by 
1974  , 
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Token  hierarchical  structure 
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Zks  Concurrent  update  Problem 

In  some  systems  on-line  update  is  allowed  by  many  users. 
This  type  of  operation  generates  very  difficult  problems  of 
locking,  consistency,  deadlock  and  efficiency,  We  will  discuss 
the  problem  in  general  and  then  we  will  outline  our  rather 
trivial  solution. 

In  order  to  update  a  data  element  many  items  may  have  to 
be  locked.  In  a  system  with  derived  relations  there  is  a 
duplication  of  paths  which  can  be  used  to  access  the  data.  A.11 
these  paths  should  be  consistent  with  the  values  of  the  data. 
For  instance,  if  a  relation  S  was  used  to  form  a  join,  an  update 
in  R  may  affect  the  join.  In  such  a  case  we  need  to  lock  more 
than  one  item  before  we  perform  the  update.  The  additional 
search-aiding  structures  need  to  be  changed  when  the 
corresponding  values  are  updated.  For  instance  the  corresponding 
inverted  file  should  be  changed  when  there  is  an  update  within  a 
domain,  Kany  user  requests  need  to  operate  on  more  than  one 
tuple  (for  instance,  according  to  a  qualification).  As  a  result 
many  tuples  need  to  be  locked  for  the  completion  of  the 
operation.  If  we  lock  and  update  a  tuple  at  a  time,  then  the 
data  base  may  be  inconsistent  in  the  middle  of  the  operation. 
This  is  not  acceptable.  For  example,  when  transferring  funds 
among  accounts,  we  expect  a  conservation  of  all  moneys  postal. 
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The  number  of  items  we  have  to  lock  depends  not  only  on 
the  request,  but  also  on  the  level  at  which  we  decide  to  do  the 
locking.  Locking  can  be  done  for  entire  monitors  of  the  DBMS,  at 
the  relation  level,  at  the  domain  level,  at  the  tuple  level,  or 
at  the  domain  value  level.  If  we  lock  at  a  very  high  level  there 
is  a  chance  of  locking  non-essential  parts  of  the  data  base. 
This  situation  affects  especially  frequent  retrieval  operations 
which  could  go  on,  if  the  lock  were  not  present.  On  the  other 
hand,  if  we  lock  at  a  low  level  we  need  to  lock  many  more  items, 
When  the  number  of  items  to  be  locked  is  very  high,  the  locking 
operation  is  time  consuming.  We  also  need  to  be  careful  to  avoid 
deadlocks.  In  addition,  the  locking  requests  can  become 
entangled  in  fancy  ways  which  generate  overhead  and  considerably 
slow  down  the  operation  of  the  system. 

Ke  cannot  lock  many  unrelated  items  all  at  once,  at 
least  with  current  hardware.  When  we  lock  them  one  by  one  we  may 
need  to  unlock  to  avoid  deadlock  in  the  presence  of  conflict  with 
another  locking  activity.  This  may  give  rise  to  an  ’'accordion" 
effect  in  which  separate  locking  activities  collide  and  withdraw 
alternatively.  It  is  very  hard  for  the  system  to  diagnose  such  a 
situation  if  the  locking  activities  are  independent. 

In  another  approach  all  locking  activity  in  different 
parts  of  the  data  base  can  be  performed  by  separate  distinct 
locking  monitors.  This  is  a  case  of  multilevel  locking.  Wa 
first  obtain  exclusive  use  (lock)  of  the  locking  monitor  which 
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later  does  the  locking  for  the  request.  The  locking  monitor  can 
arbitrate  between  conflicting  requests,  or  eA;-en  send  messages 
when  a  pending  locking  request  can  proceed.  It  is  important  that 
items  are  locked  for  the  minimum  amount  of  time  possible.  For 
that  reason  it  may  be  advantageous  to  touch  (making  them 
unlockable)  items  before  locking.  We  only  lock  them  when  all 
needed  items  can  be  locked.  In  this  manner  we  do  not  interfere 
during  locking  with  ongoing  retrieval  requests. 

One  of  the  hardest  problems  concerns  the  action  taken 
when  a  locking  operation  cannot  proceed.  The  request  should 
obviously  be  blocked.  The  exact  way  of  blocking  a  request 
depends  very  heavily  on  the  characteristics  of  the  operating 
system.  We  have  at  least  the  following  alternatives, 

1)  Let  the  requesting  process  ”busy  wait”, 

2)  Take  advantage  of  the  scheduling  algorithm.  For  example, 
by  terminating  the  current  time-slice  we  try  again  during 
the  next  time-slice. 

3)  Block  the  process.  The  process  is  given  an  alarm  when 
the  locking  can  proceed  or  when  it  should  try  again. 

4)  "Fake”  the  whole  update  request.  Delay  the  actual  update 
operation.  Every  subsequent  request  accessing  the  data 
will  have  to  go  through  a  "posted  updates”  monitor. 
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The  questions  of  level  of  locking,  method  of  locking  and 
manner  of  blocking  are  very  important  in  any  system  and 
esoeciallv  so  in  a  flexible  relational  svstea.  It  is  hard  to 
give  general  answers.  They  depend  heavily  on  the  type  of 
hardware,  properties  of  the  operating  system,  nature  of  data  base 
and  type  of  requests. 

In  both  systems  we  follow  a  very  straightforward  and 
unsophisticated  method  of  locking,  ¥,e  lock  at  a  high  level, 
e.g. ,  relation,  domain  and  segment  type.  In  addition,  all 
locking  operations  can  be  performed  only  by  a  monitor.  For  each 
lockable  item  there  is  a  read  count  and  a  write  lock.  Concurrent 
reading  is  allowed  but  update  is  exclusive.  Other  access  paths 
and  inverted  files  are  locked  at  the  same  time  as  data.  For 
instance,  in  the  relational  prototype  system  when  a  domain  is 
locked  ■  all  derived  relations  involving  the  domain  are  also 
locked. 

In  case  of  conflict  the  request  is  "absolutely'*  refused. 
The  higher  levels  of  the  system  can  decide  how  to  interpret  the 
refusal,  what  to  do  and  what  to  tell  the  user.  The  monitor 
delegates  the  servicing  of  the  request  after  it  checks  the  locks. 
In  this  manner  the  complex  search  mechanisms  can  be  concurrent 
and  the  bottleneck  effect  of  the  monitor  is  minimized. 

We  do  not  have  any  illusions  about  such  a  simple  scheme. 
It  is  very  primitive,  but  we  believe  it  is  effective.  Very 
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sophisticated  schemes  can  incur  tremendous  overhead. 
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[’'lost  of  the  issues  involved  in  implementing  relations 
need  experimentation.  Without  experimentation  it  is  difficult  ro 
evaluate  techniques.  The  main  purpose  of  our  work  is  to  develop 
and  experiment  with  techniques.  .  Systems  are  only  produced  to 
support  the  methods.  Successful  techniques  can  be  later  adopted 
for  commercial  systems  and  large  data  bases. 


Wg  are  experimenting  with  tools  for  the  iaplenenra tion 
of  relational  systems.  So  far  we  are  looking  at  two  distinct 
mechanisms,  e.g.,  pointer  arrays  and  hierarchies.  Both  pointer 
arrays  and  hierarchies  are  subsumed  in  the  facilities  proposed  in 
the  DBTG  proposal.  This  offers  another  way  to  look  at  our 
research.  We  investigate  the  facilities  of  the  D3TG  which  are 
appropriate  as  implementation  tools  in  order  to  give  the  users  a 
relational  viewpoint. 
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35  ON  IMPLSMENTATICN  OF  EELATICNS 
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